ATAC‐seq exposes differences in chromatin accessibility leading to distinct leaf shapes in mulberry

Abstract Mulberry leaf shape is an important agronomic trait indicating yield, growth, development, and habitat variation. China was the earliest country in the world to grow mulberry for sericulture, and it is also one of the great contributions of the Chinese nation to human civilization. ATAC‐seq (Assay for Transposase Accessible Chromatin using sequencing) is a recently developed technique for genome‐wide analysis of chromatin accessibility. The samples used for ATAC sequencing in this study were divided into two groups of whole leaves (CK‐1 and CK‐2) and lobed leaves (HL‐1 and HL‐2), with two replicates in each group. The related motif analysis, differential expression motif screening, and functional annotation of mulberry leaf shape differences were performed by raw letter analysis to finally obtain the transcription factors (TFs) that lead to the production of heteromorphic leaves. These transcription factors are common in plants, especially the TCP family, shown to be associated with leaf development and growth in other woody plants and are a potential transcription factor responsible for leaf shape differences in mulberry. Dissecting the regulatory mechanisms of leaf shape of different forms of mulberry leaves by ATAC‐seq is an important way to protect mulberry germplasm resources and improve mulberry yield. It is conducive to cultivating mulberry varieties with high resistance to adversity, promoting the sustainable development of sericulture, and protecting and improving the ecological environment.

Mulberry leaves are composed of a petiole, leaf blade, and stipules, and are the main assimilation organs and harvests of mulberry trees. In nature, the same plant can display both complete and lobed leaves, and because both are derived from the same genetic background, the differences in their phenotypes are regulated by epigenetics (Itabashi et al., 2018;Zhao & Zhou, 2012). Leaf morphological differences have an important role in variety identification, sericulture production, and natural disease control. Leaf shape is an important agronomic trait in mulberry because the size and shape of mulberry leaves are closely related to economic yield. The leaf shape also reflects differences in energy metabolism, variations in tree habitats, and genetic variation (Cao et al., 2019;Ramesha et al., 2020;Samami et al., 2019). The study on the growth and morphogenesis of mulberry leaves and the mechanisms underlying variation in leaf shape can provide strong support for future mulberry as a model woody plant. The pendant mulberry leaf types, including M. alba Pendula can vary between unlobed and up to 6-7-lobed. Therefore, M. alba Pendula provides a most suitable research material to study epigenetic mechanisms of leaf shape regulation in mulberry.
In recent years, the formation of leaf shape has been studied extensively. Detailed morphological observations of leaf development from meristem cell initials has led to the identification of growth centers and the development of predictive models of polarized growth along proximal-distal, medial-lateral, and adaxial-abaxial axes in simple and complex leaf structures (Diaz, 2017). Some of the genes regulating growth along the different axes and their interplay have been reported (Itakura & Hosoi, 2018). However, the regulatory networks underlying the formation of different leaf shapes from the same genome, such as in mulberry, has received little attention to date. Therefore, the aim of this work is to identify, study, and utilize key TFs governing the formation of different mulberry leaf morphologies. A greater understanding of the regulatory mechanisms underlying the production of different mulberry leaf morphologies is an important way to conserve mulberry germplasm resources and improve mulberry yield. It is also conducive to breeding mulberry varieties with high resistance to adversity, promoting sustainable development of sericulture and protecting and improving the environment.
ATAC-seq is a recently developed technique for genome-wide analysis of chromatin accessibility. Compared with earlier ChIP-seq or DNase-seq methods for detecting chromatin accessibility, ATAC-seq is faster and easier to perform, does not require cross-linking, has a higher signal-to-noise ratio, and can be performed on a smaller number of cells. ATAC-seq takes into account transposase-digested DNA fragments that contain other nucleosome localization information (Lu et al., 2021). ATAC-seq seq is widely used to measure chromatin accessibility and to identify open chromatin regions (OCRs). OCRs usually indicate active regulatory elements in the genome and is directly related to gene regulatory networks. Identification of differentially accessible regions (DARs) between different biological conditions is essential to determine the differential activity of regulatory elements. Differential analysis of ATAC-seq data has many similarities to differential expression analysis of RNA-seq data. However, the distribution of ATAC-seq signal intensity differs from that of RNA-seq data, and DAR identification requires higher sensitivity.
Many different tools can be used to perform differential analysis of ATAC-seq data (Gontarz et al., 2020).
In our study, we have analyzed the motifs associated with leaf shape differences in mulberry through chromatin accessibility assay of and identified the associated TFs through differential expression motif screening, functional enrichment, and annotation, thus providing new information of molecular mechanisms involved in the generation of heteromorphic leaves in mulberry.

| Plant materials
M. alba Pendula were obtained from the Nursery of the National Mulberry Germplasm at Jiangsu University of Science and Technology, Zhenjiang, China. Young whole and lobed leaves were sampled in early April. Furthermore, fresh, young, and actively growing tissues were removed from mulberry trees. Dust or soil from the surface of the material were rinsed with sterile water and blot it up. Large tissues were cut into small pieces or thin slices on ice and then placed into 2 ml screw-cap lyophilization tubes of >400 mg each, labeled with the sample name; quickly submerged in liquid Nitrogen for at least 1 h of snap-freezing and then stored in a À80 C refrigerator for subsequent experiments.

| Flow chart of experiments
Nuclei suspensions were incubated in a Transposition Mix that includes a Transposase. The Transposase enters the nuclei and preferentially fragments the DNA in open regions of the chromatin. Simultaneously, adapter sequences are added to the ends of the DNA fragments. The transposition reaction was incubated at 37 C for 30 min. Immediately following transposition, the products were purified using a QIAGEN mini-elute kit, amplified as described in (Buenrostro et al., 2013), and sequenced using Illumina HiSeqTM 4000 by Gene Denovo Biotechnology Co. (Guangzhou, China).

| Clean reads filtering
Reads obtained from the sequencing machines were filtered to remove adapter sequences, reads containing more than 10% of unknown nucleotides (N) and low quality reads containing more than 50% of low quality (Q value ≤ 20) bases.

| Reads alignment
Bowtie2 (Langmead & Salzberg, 2012) (version 2.2.8) with the parameters"-X 2000 00 was used to align the clean reads from each sample against the reference genome, and reads aligned to the mitochondria or chloroplasts were filtered out. For all data files duplicates were removed using Picard.

| Peak scanning
All reads aligning to the + strand were offset by +4 bps, and all reads aligning to thestrand were offset À5 bps. The shifted, concordantly aligned paired mates were used for peak calling by MACS (version 2.1.2) (Zhang et al., 2008) with the parameters "--nomodel --shift -100 --extsize 200 -B -q 0.05." Dynamic Poisson Distribution was used to calculated The p-value of the specific region based on the unique mapped reads. The region was defined as a peak (accessible DNA region) when the q-value <.05.

| Peak-related genes annotation
According to the genomic location information and gene annotation information of the peak, peak-related genes were confirmed using ChIPseeker (Yu et al., 2015) (version v1.16.1). The position of the peak on the gene or inter-genic regions (such as promoter, 5 0 UTR, 3 0 UTR, exon, intron, downstream and intergenic) was recorded.

| Peak-related genes GO enrichment analysis
Gene ontology (GO) is an international standardized gene functional classification system which offers a dynamic-updated controlled vocabulary and a strictly defined concept to comprehensively describe properties of genes and their products in any organism. GO has three ontologies: molecular function, cellular component and biological process. The basic unit of GO is the GO term. Each GO term belongs to a type of ontology. GO enrichment analysis provides all GO terms that significantly enriched in peak-related genes comparing with the genome background, and filter the peak-related genes that correspond to biological functions.
All peak-related genes were mapped to GO terms in the Gene Ontology database (http://www.geneontology.org/) and gene numbers were calculated for every term. Significantly enriched GO terms in peak-related genes comparing with the genome background were defined by the hypergeometric test. The formula for p-value calculation was P=1 À P mÀ1 2.8 | Peak-related genes pathway enrichment analysis KEGG pathway (Kanehisa et al., 2008) enrichment analysis was used to identify significantly enriched metabolic pathways or signal transduction pathways in peak-related genes relative to the whole genome background. The calculating formula is the same as that in GO analysis. Here, N is the number of all transcripts that with KEGG annotation, n is the number of peak-related genes in N, M is the number of all transcripts annotated to specific pathways, and m is number of peak-related genes in M. The calculated p-value was FDR corrected, taking FDR ≤ .05 as a threshold. Pathways meeting this condition were defined as significantly enriched.

| Irreproducible discovery rate (IDR)
To measure consistency between biological replicates within an experiment, Irreproducible discovery rate (IDR) (Li et al., 2011) software (version 2.2) was used with parameters "--input-file-type nar-rowPeak -plot." IDR calculations using scripts provided by the ENCODE project (https://www.encodeproject.org/software/idr/) were performed on all pairs of replicates using an oracle peak list called from merged replicates for each condition. Peaks showing an IDR value of .05 or less were considered relevant.

| Motif analysis
The MEME suit (http://meme-suite.org/) was used to detect motifs in peak sequences. We used MEME-ChIP to scan motifs with high reliability through peak regions, and nd used MEME-AME to confirm the existences of any specific known motifs.

| Differential analysis of multi-samples
The DiffBind (Stark & Brown, 2011) software (version 2.8.0) was used to analyze peak differences across the two groups and significant differential peaks with an FDR of <.05 were selected. Using the same method, genes associated with the different peaks were annotated, and enrichment analysis of GO functions and KEGG pathways were identified.

| Transcription factors validation (TFs) by qRT-PCR
Total RNA was extracted from fresh leaf samples using the RNAiso Plus reagent (Takara, China), according to the manufacturer's protocol.
First, the concentration and purity of total RNA were checked on a remove low-quality data. The filtered data was then compared with the reference genome and after confirming the quality of the comparison, the sequences at the unique position on the comparison were extracted and processed for information analysis.
The samples used for ATAC sequencing in this study were divided into two groups of whole leaves (CK-1 and CK-2) ( Figure 1a) and lobed leaves (HL-1 and HL-2) ( Figure 1b). The Clean reads obtained from the lower machine data after processing CK-1, CK-2, HL-1, and HL-2 were 217924734, 179669554, 198654016, and 181546126, respectively, with the percentage of low quality reads ranging from .17% to .34% (Table 1). Frequency and proportion plots of reads indicate that HQ_Clean_Reads_Num and Adapter account for a significant proportion ( Figure 2). From the distribution of bases before and after data filtering, it can be seen that the quality of filtered data is higher ( Figure S1).

| Comparison analysis
The data were matched to the reference genome using the matching software Bowtie2, and reads that matched to mitochondria or chloroplasts were filtered out (Table S1). Reads matching to a unique position on the genome (unique matched reads) were used for the subsequent information analysis ( Table 2). The unique matched sequences obtained after alignment were analyzed for their coverage distribution on the reference genome, and the depth information of genomic loci was counted to obtain the statistical results of sequencing depth on the genome ( Figure S2). All reads in the 2 kb region upstream and downstream of the transcription start site (TSS) were counted using deepTools (Ramírez et al., 2016)  analysis software (peak calling) with a threshold q-value < .05. The F I G U R E 1 Mulberry leaf shape. (a) Whole leaf; (b) lobed leaf peak sequences were then analyzed to identify their associated genes (Table 3). The distribution of peaks was counted (Table S2) and their genomic distribution classified as either promoter (within 2 k upstream of the gene), 5 0 UTR, 3 0 UTR, Exon, intron, downstream (within 500 bp downstream of the gene) or distal intergenic (beyond 2 k upstream of the gene or beyond 500 bp downstream) (Table S3), and determine the correspondence between the peak and each functional region ( Figure 5).
Where the location of peaks was classified as distal intergenic, their regulatory relationship with the nearest neighboring gene could not be established with any confidence. Therefore, peaks categorized as distal intergenic were eliminated from further consideration. The remaining peak-related genes were used in GO and KO enrichment analyses ( Figures S4 and S5).

| Analysis of TFs associated with intra-group shared peaks
TFs are proteins that bind specific DNA regulatory motifs of genes to regulate their gene transcription. TF-motif analysis was performed using MEME-chip motif discovery of the MEME Suite (http://memesuite.org/). The analysis included an enrichment analysis of known TF (present in the database) and predicted TF motifs. The top 400 peaks of the enrichment multiplicity were taken by default ( Figure 6). Different colors represent different base types, and the height of the letter represents the conservativeness of the base (the higher the letter, the higher its frequency across the locus, the more conserved it is).
Currently, a large number of DNA motifs recuiting TFs have been reported in transcription factor databases (e.g., JASPAR database). Enrichment analyses of these motifs helps us to quickly target the corresponding key TFs, which provides clues for further data parsing and downstream molecular experiments. Here, we have used Analysis of Motif Enrichment (AME) of the MEME Suite to perform enrichment analysis of TF motifs (Table S4). AME uses the JASPAR CORE nonredundant database as a control for input sequence enrichment.  (Table S5), and the experimental results showed consistent trends between the two experimental methods, ATAC-seq and qPCR (Figure 8).  (principal components), while ensuring that as much information as possible was retained in the original data ( Figure 9). The Pearson correlation coefficients calculated between every two samples are presented visually in the form of a heat map ( Figure 10).

| Analysis of peaks and their associated genes shared or unique in simple and lobed leaves
A Venn diagram describes the common and unique peaks present in simple and lobed leaves ( Figure 11). In particular, the peaks specific to each group were associated with the biological regulation specific to the corresponding group. For the peaks that are common or unique among the comparison groups, we further analyze the peakassociated genes. The nearest neighboring gene of each peak were initially considered as the peak-related gene. Considering that when peaks are far away from genes, it is difficult to determine whether there is a regulatory relationship between peaks and genes. Therefore, the peaks classified as distal intergenic were not considered further.
The remaining peak-related genes were subsequently subjected to GO ( Figure S8) and KO enrichment analyses ( Figure S9).

| Analysis of TF motifs exclusively present in simple or lobed leaves
For peaks specific to a particular treatment group, TF motif analysis was performed use using the MEME Suite to predict conserved motifs within the peak region (400 peaks were taken by default). MEME-chip was used to detect new motifs, and the AME package used to detect known motifs. Longer (8-15 bp) and shorter motifs (3-8 bp) were predicted from scratch using MEME software (Figure S10 A&B) and DREME software ( Figure S10C,D), respectively. For the enrichment analysis of known TFs-motifs, the number of grouped TF motifs were 35 and 77 (Table S6).
Grouped enrichment statistics were performed (Table S7) and grouped motif enrichment bubble plots were obtained ( Figure S11).
3.8 | Quantitative analyses of the differential availability of peaks and associated genes between simple and lobed leaves The qualitative analysis of peak and associated genes presented above indicated some TF binding sites that were differentially employed in F I G U R E 9 Sample principal component analysis simple and lobed leaves. However, quantitative differences in the availability of TF binding sites can also reveal potential differential regulations of genes between these two morphologies.
We therefore used DiffBind software to identify the differential availability of peaks between simple and lobed leaf samples, where peak motifs with differences in abundance of log2 values ≥1 and a FDR ≤.05 were selected and their peak-associated genes were identified as described above to provide an indication of the number and types of genes showing differential susceptibility to regulation by their corresponding peak motif in the two morphological leaf types. A statistical plot of genes displaying such significantly differential susceptibilities is presented in (Figure 12).
Genes with differential susceptibilities for regulation by differentially available peak motifs were for subsequently subjected to GO (Figure 13) and KO enrichment analyses (Figure 14).
3.9 | Enrichment analysis of exposed TF-motifs in simple and lobed leaves TF motif analysis for intergroup differential peaks assisted the identification of TFs involved in high and low variation in genomic openness between simple and lobed leaves. The MEME Suite was used as described above. Up-regulated peak motif de novo prediction, using MEME software, from scratch to predict longer motif (8-15 bp) (Figure S12 A) down-regulate the motif denovo prediction of peak. The following is the long motif (8-15 bp) predicted from scratch using MEME software (Figure S12  Figure S13).

| DISCUSSION
The mulberry tree is a very important economic tree species, which is widely planted in the world. Its various organs have different economic or medicinal values, especially mulberry leaves, which are an important food source for silkworms (Ruth et al., 2019). In this study, we have utilized ATAC-seq to analyze differences in chromatin accessibility between lobed and simple leaves of mulberry in order to iden- suggesting their involvement in regulating leaf shape development (Ma et al., 2016). In this study, as shown in Figure 7, there were more TCP families, and most were enriched in the CK group, suggesting the possibility of TCP may be decreased in the HL group, resulting in the change of leaf shape (from whole leaves to lobed leaves). Also, the TCP family analyzed from the down-regulated peaks in the HL group showed similar patterns ( Figure S13B). This further supports the possibility that the expression of the TCP transcription factors family is reduced in cleaved leaves. The results showed that the TCP family is an important transcription factor that leads to different leaf shapes.  The vertical coordinate is GO term, the horizontal coordinate is the enrichment factor (the number of variances in that GO term divided by all the numbers), and the bubble size indicates how many and the darker the color the smaller the Q value). (d) GO enrichment bar graph (constructed using the first 20 GO term with the smallest Q values): The vertical coordinate is the GO term, and the horizontal coordinate is the number of that GO term as a percentage of the number of all variances. The darker the color, the smaller the Q value, and the values depicted on the bars are the number of GO term and the Q value. Different colors represent different A classes. Second circle: the number of pathways in the background genes and the Q value. The more genes the longer the bar, the smaller the Q value the redder the color. Third circle: the number of genes differing in the pathway. Fourth circle: the RichFactor value of each pathway (the number of differences in the pathway divided by the number of all), background grid lines, each cell represents .1 RichFactor unit). The vertical coordinate is the pathway, the horizontal coordinate is the enrichment factor (the number of differences in the pathway divided by the number of all), the size indicates the number of how much, and the more red the color the smaller the Q value. The darker the color the smaller the Q value, and the value on the bar is the number of that pathway and the Q value.

| CONCLUSION
In this study, we used ATAC-seq to mine the TFs that lead to leaf shape differences, detected and counted single samples and intragroup peaks respectively by bioinformatics analysis and performed TF motif analysis to provide insight the TFs related to heteromorphic leaf occurrence. This study has laid the foundation for an in-depth study to understand the mechanisms underlying leaf shape changes in mulberry.

CONFLICT OF INTEREST
The Authors did not report any conflict of interest.